-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split pragmatics into presuppositions and scalar implicatures #2938
Split pragmatics into presuppositions and scalar implicatures #2938
Conversation
Hi @weiqipedia, for your info. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. Note that you have to change schema_bhasa.yaml
to reflect changes (but that can be done in a separate pull request).
) | ||
# Split "True or False" into ["True", "or", "False"] | ||
choices = row["choices"].split() | ||
choices_translated = row["choices_translated"].split() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work consistently across every (supported) language?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question! For now we only have Indonesian (and Tamil), and this splitting and taking the first and third index of the list does work for both languages. But just FYI, this will not work for Thai because of the lack of spaces, and we'll have to use something more similar to your suggestion of " or " (but we will not be having Thai any time soon)
if self.language not in self.prompts.keys(): | ||
raise (Exception(f"Unsupported language {self.language} - supported languages are {self.prompts.keys()}")) | ||
else: | ||
self.prompt_components = self.prompts[self.language] | ||
|
||
def download_dataset(self, output_path: str): | ||
BASE_URL = "https://raw.githubusercontent.com/aisingapore/BHASA/main/lindsea/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: You can pin this to a specific commit githash so that future changes to the git won't cause this scenario to change. e.g.
BASE_URL = "https://raw.githubusercontent.com/aisingapore/BHASA/10e34008e8142bef400cf8ffab15b2b6aaf3aa7f/lindsea/"
0ab8fc3
to
3d88380
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
dataset = pd.read_json(target_path_file, lines=True) | ||
datasets = [] | ||
for subset in self.subsets: | ||
URL = f"{BASE_URL}{self.language}/pragmatics/pragmatic_reasoning_{subset}.jsonl" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: URL
should be lowercase (it is not a constant)
No description provided.